Understanding Functions in R

NHS-R Community Workshop

Fran Barton

2024-02-22

Welcome!

Some intentions for the session:

  • I want to get you writing some functions
  • I want you to ‘get’ something about the way a function works
  • I want you to feel safe to ask questions and try things out.

Intended outcome:

  • I hope as a result of today you will be able to write your own functions, as and when you find them useful.

Structure

  • Writing very simple functions
  • What is a function anyway?
  • Function arguments
  • Using functions to improve repetitive code
  • Why are functions useful for analysts?
  • Time for questions / chat

Over to RStudio…

  • Writing a very (very!) basic function

R code

01_initial.R

  • Writing some more pretty basic functions
  • Playing with arguments
  • Variables inside and outside the function

What is a function anyway?

“An encapsulation of a task”?

An Orion space capsule shown orbiting Earth

wikimedia commons

One way of thinking about arguments

The function, in a way, doesn’t need to care about what’s under the mask.

Whatever it is, just (try to) do this to it.

Meet Fenella …

A security guard looking stern

Fenella’s job is to check that the parcel has the right label on it. If it does, she will allow it into the warehouse (function).

A humble parcel

Fenella’s job description

  • The labels that Fenella checks against are defined when you create the function
  • Fenella doesn’t look inside the box
  • The contents of the box only get dealt with inside the warehouse (function)
  • What happens if you try to bring a parcel in that has an unexpected label?
  • What happens if you don’t supply a box for one of the expected labels?

To the code again…

R code

02_arguments.R

The carrier bag theory of function(s)

“An encapsulation of a task”? A carrier bag?

A Poundland carrier bag and a seagull

Ben Sutherland

  • A function can contain a lot of tasks
  • A function can just do one task
  • How big you make your functions is a matter for you 😊

Put it in the bag

If you have code that looks like this:

source("data_input1.R")
source("data_input2.R")
data3 <- read.csv("more_data.csv")
data_out <- rbind(data1, data2, data3)

...

… you might want to put it all in a function

get_data <- function() {
  source("data_input1.R")
  source("data_input2.R")
  data3 <- read.csv("more_data.csv")
  rbind(data1, data2, data3)
}

data_out <- get_data()

Using functions to tame repetitive code

“If you find yourself writing the same thing more than twice, turn it into a function.”

To which I would add,

“If you find yourself using a function in more than one project, add it to a package.”

An example (1)

library(dplyr)

starwars
# A tibble: 87 × 14
   name     height  mass hair_color skin_color eye_color birth_year sex   gender
   <chr>     <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> <chr> 
 1 Luke Sk…    172    77 blond      fair       blue            19   male  mascu…
 2 C-3PO       167    75 <NA>       gold       yellow         112   none  mascu…
 3 R2-D2        96    32 <NA>       white, bl… red             33   none  mascu…
 4 Darth V…    202   136 none       white      yellow          41.9 male  mascu…
 5 Leia Or…    150    49 brown      light      brown           19   fema… femin…
 6 Owen La…    178   120 brown, gr… light      blue            52   male  mascu…
 7 Beru Wh…    165    75 brown      light      blue            47   fema… femin…
 8 R5-D4        97    32 <NA>       white, red red             NA   none  mascu…
 9 Biggs D…    183    84 black      light      brown           24   male  mascu…
10 Obi-Wan…    182    77 auburn, w… fair       blue-gray       57   male  mascu…
# ℹ 77 more rows
# ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,
#   vehicles <list>, starships <list>

An example (2)

droids_mean_height <- starwars |>
  filter(species == "Droid") |>
  summarise(droid_mean_height = mean(height, na.rm = TRUE))

droids_mean_height
# A tibble: 1 × 1
  droid_mean_height
              <dbl>
1              131.
ewoks_mean_height <- starwars |>
  filter(species == "Ewok") |>
  summarise(droid_mean_height = mean(height, na.rm = TRUE))

ewoks_mean_height
# A tibble: 1 × 1
  droid_mean_height
              <dbl>
1                88

Try again…

droids_mean_height <- starwars |>
  filter(species == "Droid") |>
  summarise(droid_mean_height = mean(height, na.rm = TRUE))

droids_mean_height
# A tibble: 1 × 1
  droid_mean_height
              <dbl>
1              131.
ewoks_mean_height <- starwars |>
  filter(species == "Ewok") |>
  summarise(ewok_mean_height = mean(height, na.rm = TRUE))

ewoks_mean_height
# A tibble: 1 × 1
  ewok_mean_height
             <dbl>
1               88

Convert common code to a function (1)

Original code
droids_mean_height <- starwars |>
  filter(species == "Droid") |>
  summarise(droid_mean_height = mean(height, na.rm = TRUE))

First steps: name it and wrap it

Name it and wrap it
get_species_mean_height <- function() {
  starwars |>
    filter(species == "Droid") |>
    summarise(droid_mean_height = mean(height, na.rm = TRUE))
}

Note that we don’t need to worry about the output will be called (if anything). The user of the function will decide that when they run it.

Back to RStudio…

R code

03_refactoring.R

Advantages of refactoring to a function

  • You can spot efficiencies
  • You can be very clear what the inputs and outputs ought to be
  • You might be able to pass it to lapply() or purrr::map(), if useful.

Getting functions to ‘click’

I remember it taking me a while, when I first tried understanding functions, for them to ‘click’ for me.

I didn’t understand the difference between variables inside the function and outside it. I didn’t really get what the point was.

Getting functions to ‘click’

I didn’t realise the power of the abstraction of using variables (as function arguments), instead of just passing in values.

Eventually I realised that this means your code can work harder for you. It can do its thing in different situations, and work with different inputs.

Why are functions useful?

To quote Hadley Wickham & co.:

R for Data Science, Functions section

  1. You can give a function an evocative name that makes your code easier to understand.
  2. As requirements change, you only need to update code in one place, instead of many.
  3. You eliminate the chance of making incidental mistakes when you copy and paste (i.e. updating a variable name in one place, but not in another).
  4. It makes it easier to reuse work from project-to-project, increasing your productivity over time.

How can functions help us do our jobs well?

  1. Naming
  2. Efficiency
  3. Accuracy
  4. Transferability

Useful resources

Some things we haven’t covered

  • Pros and cons of writing longer or shorter functions
  • Checking function arguments
  • Passing functions to other functions eg purrr::map()
  • Using anonymous (one-off) functions in dplyr::mutate(), for example
  • Writing tests for functions
  • Creating an R package

Thanks for attending!